从csa.aviation.ge.com/tcta/index#/targetSummary网址下载TCT的Input数据,处理Input数据并画图保存至pdf文件。
下载Input数据
- 步骤
Analytics –> Target Cost Tool –> Target Baseline Summary Report –> Baseline Summary Criteria Selection (Fleet Model : GE90-Growth ; Fleet Year : 2017 ; Cost Type : All In Cost) –> Please Set The Configuration (SV Parameters : value : SHOP : TEXL offload/Wales ; SV Number : 1/2/3/4/5 ; Region : Non SEO/SEO/SEO Plus;Block:2012、2015、EIS Old、EIS New)
共下载了100个.xlsx文件,命名格式为 :Region-SHOP-SV Number,例如 :Non SEO-TEXL offload-1
处理Input数据
- 处理规则
1.处理Input中的100个.xlsx文件,生成Output中的100个.xlsx文件;Input中的每个.xlsx文件有两个sheet,分别为NON-BUM Target Summary和BUM Target Summary,每个sheet的各列为Row labels(代表Sub Module)、II、III、IV、SA;将两个sheet的各Sub Module的对应level的花费相加,生成Output中的各.xlsx文件的Target工作表。
2.将Output中的100个.xlsx文件合并为Output/Target_2017.xlsx;将TEXL 2017 60 PR SV Cost Analytics May 4.xlsx中的Target_Summary数据另存为Target_2016.xlsx文件;将Output/Target_2017和Target_2016.xlsx合并为Target.xlsx文件。
3.逆转换列,将II、III、IV、SA逆转换为WS列和Module Target Cost列,存为Target_tran.xlsx文件。
4.TCT.py1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69import pandas as pd
import os
#Statistic Input file name
file_name_Input = []
for file in os.Listdir('Input'):
file_name_Input.append(file)
print('the number of file:',len(file_name_Input))
#求和时若有一个为空无法相加时,令值为另外一个
def NONBUM_add_BUM(data, level, NONBUM, BUM):
data[level] = np.where(data[NONBUM].isnull(), data[BUM], np.where(data[BUM].isnull(), data[NONBUM], data[NONBUM] + data[BUM]))
#Handle Input per Excel
for excel_name in file_name_Input:
data_NON_BUM = pd.read_excel('Input/' + excel_name, 'NON-BUM Target Summery')
data_BUM = pd.read_excel('Input/' + excel_name, 'BUM Target Summery')
data_NON_BUM = pd.DataFrame(data_NON_BUM[data_NON_BUM['Row labels'] != 'Grand Total'])
data_BUM = pd.DataFrame(data_BUM[data_BUM['Row labels'] != 'Grand Total'])
#merge
data = pd.merge(data_NON_BUM, data_BUM, on = 'Row labels', how = 'left')
NONBUM_add_BUM(DATA, 'II', 'II_x', 'II_y')
NONBUM_add_BUM(DATA, 'III', 'III_x', 'III_y')
NONBUM_add_BUM(DATA, 'IV', 'IV_x', 'IV_y')
NONBUM_add_BUM(DATA, 'SA', 'SA_x', 'SA_y')
columns = ['NA','II_x','II_y','III_x','III_y','IV_x','IV_y','SA_x','SA_y']
for col in columns:
del data[col]
#Target Add New Columns
writer = pd.Excelwriter('Output/Sub' + excel_name)
data_NON_BUM.to_excel(writer, 'NON-BUM Target Summery', index = False)
data_BUM.to_excel(writer, 'BUM Target Summery', index = False)
data_Target = pd.DataFrame(data[data['Row labels'] != 'Grand Total'])
index = ecxel_name.find('-')
index2 = excel_name.find('-', index + 1)
index3 = excel_name.find('-', index2 + 1)
data_Target['Shop'] = excel_name[0:index]
data_Target['Block'] = excel_name[index + 1:index2]
data_Target['Region'] = excel_name[index2 + 1:index3]
data_Target['SV Count'] = excel_name[index3 + 1:index3 + 2]
data_Target['Year'] = 2017
data_Target.to_excel(writer,'Target',index = False)
writer.save()
#statistic Output Sub File Name
file_name_Output = []
for file in os.listdir('Output/Sub'):
file_name_Output.append(file)
#Combine Output Sub File
data_Target_Total = pd.DataFrame()
for excel_name in file_name_Output:
data_Target = pd.read_excel('Output/Sub/' + excel_name,'Target')
data_Target_Total = pd.concat([data_Target_Total, data_Target])
#Generate the final file
writer = pd.ExcelWriter('Output/Target_2017.xlsx')
data_Target_Tatol.to_excel(writer, 'Target', index = False)
writer.save()
#Combine the data of 2017 and 2016
target_2017 = pd.read_excel('Output/Target_2017.xlsx')
target_2016 = pd.read_excel('Target_2016.xlsx')
target_2016['Year'] = 2016
target = pd.concat([target_2017, target_2016])
writer = pd.ExcelWriter('Target.xlsx')
target.to_excel(writer,'Target', index = False)
writer.save()
#generated Target_tran.xlsx
target = pd.read_excel('Target.xlsx')
target_tran = pd.melt(target,id_vars = ['Roew labels', 'Shop', 'Block', 'Region', 'SV Count', 'Year'], var_name = 'WS', value_name = 'Module Target Cost')
none_MTC = (target_tran['Module Target Cost'].isnull())|(targte_tran['Module Target Cost'].apply(lambda x:str(x).isspace()))
target_tran = target_tran[~none_MTC]
target_tran.to_excel('Target_tran.xlsx', index = False)
画图存入pdf文件
- 简介:信息可视化(也叫绘图)是数据分析中最重要的工作之一。做一个可交互的数据可视化也许是工作的最终目标。
matplotlib是一个用于创建出版质量图表的桌面绘图包(主要是2D方面)。matplotlib支持各种操作系统上许多不同的GUI后端,而且还能将图片导出为各种常见的矢量(vector)和光栅(raster)图:PDF、SVG、JPG、PNG、BMP、GIF等。 - 任务目标:按Region、Shop、SV Count的组合画图分析,生成TCT.pdf文件的代码TCT_Draw_pdf.py如下所示:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61import pandas as pd
import numpy as np
import matplotlib.pyploy as plt
from matplotlib.backends.backend_pdf import PdfPages
target = pd.read_excel('Target_tran.xlsx')
#strip
def columns_strip(name_col):
target[name_col] = target[name_col].map(str)
target[name_col] = target[name_col].map(str.strip)
columns = ['Block', 'Region', 'Shop']
for nae_col in columns:
columns_strip(name_col)
#Draw
Region_List = ['Non SEO', 'SEO', 'SEO Plus']
Shop_List = ['TEXL', 'Wales']
SV_Count_List = [1, 2, 3, 4, 5]
#pdf
with PdfPages('TCT.pdf') as pdf:
for Region in Region_List:
for Shop in Shop_List:
for SV_Count in SV_Count_List:
try:
#Filter
target_sub = target.loc[(targete['Region'] == Region)&(target['Shop'] == shop)&(target['SV Count'] == SV_Count)]
picture_name = Region + '-' + 'Shop' + '-' + str(SV_Count)
#plot
data = pd.DataFrame(pd.pivot_table(target_sub, index = ["Row labels", "Year"], columns = ["Block"], values = ["Module Target Cost"], aggfunc = [np.mean]))
data.plot(kind = 'bar')
#x轴标签重构
xlabels = []
for i in range(len(data.index.labels[0])):
if str(data.index.levels[0][data.index.labels[0][i]]) == str(data.index.levels[0][data.index.labels[0][i-1]]):
xlabels.append(str(data.idex.levels[1][data.index,labels[1][i]]))
else:
xlabels.append(str(data.idex.levels[0][data.index,labels[0][i]]) + ' ' + str(data.idex.levels[1][data.index,labels[1][i]]))
#修改边距,wspace和hspace用于控制宽度和高度的百分比
plt.subplots_adjust(left = 0.20, wspace = 0.25, hspace = 0.25, bottom = 0.40, top = 0.92)#Adjust the display position of the graph to avoid incomplete display on abscissa
#标题
plt.title(picture_name)
#图例
plt.legend(["2012", "2015", "EIS New", "EIS Old"], title = "Block")
#x轴名称
plt.xlabel("Sub Module,Year")
#y轴名称
plt.ylabel("Arg Module Target Cost")
plt.xticks(fontsize = 6)
#获得当前的Axes对象ax
ax = plt.gca()
#设置x轴刻度标签
ax.set_xticklabels(xlabels)
#将y轴刻度标签改为货币格式
fmt = '${x:,.0f}'
tick = ticker.strMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)
#画网格
plt.grid(linestyle = 'dotted')
#将当前图表保存到文件
pdf.savefig()
plt.close()
except:
pass
##计算2016年和2017年的差值
按Row labels、Shop、Block、Region、SV Count、WS计算2016年和2017年的花费差值1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34import pandas as pd
target_2016 = pd.read_excel('Target_2016.xlsx')
target_2017 = pd.read_excel('Output/Target_2017.xlsx')
#Unpivot target_2016 ws and Module Target Cost Columns
target_2016['Year'] = 2016
target_2016 = pd.melt(target_2016, id_vars = ['Row labels', 'Shop', 'Block', 'Region', 'SV Count', 'Year'], var_name = 'WS', value_name = 'Module Target Cost')
none_MTC = (target_2016['Module Target Cost'].isnull())|(targte_2016['Module Target Cost'].apply(lambda x:str(x).isspace()))
target_2016 = target_2016[~none_MTC]
target_2016.to_excel('Delta/Target_2016_unpivot.xlsx', index = False)
#Unpivot target_2017 ws and Module Target Cost Columns
target_2017 = pd.melt(target_2017, id_vars = ['Row labels', 'Shop', 'Block', 'Region', 'SV Count', 'Year'], var_name = 'WS', value_name = 'Module Target Cost')
none_MTC = (target_2017['Module Target Cost'].isnull())|(targte_2017['Module Target Cost'].apply(lambda x:str(x).isspace()))
target_2017 = target_2017[~none_MTC]
target_2017.to_excel('Delta/Target_2017_unpivot.xlsx', index = False)
#target_2016_unpivot and target_2017_unpivot
target_2016_u = pd.read_excel('Delta/Target_2016_unpivot.xlsx')
target_2017_u = pd.read_excel('Delta/Target_2017_unpivot.xlsx')
#strip
def columns_strip(target, name_col):
target[name_col] = target[name_col].map(str)
target[name_col] = target[name_col].map(str.strip)
columns = ['Row labels', 'Block', 'Region', 'Shop', 'WS']
for name_col in columns:
columns_strip(target_2016_u, name_col)
columns_strip(target_2016_u, name_col)
target = pd.merge(target_2016_u, target_2017_u, on = ['Row labels', 'Shop', 'Block', 'Region', 'SV Count', 'WS'], how = 'left')
target['Delta'] = target['Module Target Cost_y'] - target['Module Target Cost_x']
del target['Year_x']
del target['Year_y']
#Renamed
target.rename(columns = {'Row labels':'Sub Module'}, inplace = True)
target.rename(columns = {'Module Target Cost_x':'2016_cost'}, inplace = True)
target.rename(columns = {'Module Target Cost_y':'2017_cost'}, inplace = True)
target.to_excel('Delta/Target_Delta.xlsx', index = False)